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The Learning Problem Course Introduction 


Course Design (1/2) 


Machine Learning: a mixture of theoretical and practical tools 


e theory oriented 
e derive everything deeply for solid understanding 
e less interesting to general audience 

e techniques oriented 


e flash over the sexiest techniques broadly for shiny coverage 
e too many techniques, hard to choose, hard to use properly 









our approach: foundation oriented 


The Learning Problem Course Introduction 


Course Design (2/2) 


Foundation Oriented ML Course 





e mixture of philosophical illustrations, key theory, core techniques, 
usage in practice, and hopefully jokes :-) 
—what every machine learning user should know 

e story-like: 

When Can Machines Learn? (illustrative + technical) 

Why Can Machines Learn? (theoretical + illustrative) 

How Can Machines Learn? (technical + practical) 


e 
e 
J 
e How Can Machines Learn Better? (practical + theoretical) 





allows students to learn ‘future/untaught’ 
techniques or study deeper theory easily | 
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The Learning Problem Course Introduction 


Course History 


Coursera Version 
e 8 weeks of ‘foundation’ (this 
course) + 7 weeks of 
‘techniques’ (coming course) 
e Mandarin teaching to reach 
more audience in need 
e slides teaching improved 
with Coursera’s quiz and 
homework mechanisms 


goal: try making Coursera version 
even better than NTU version | 














e 15-17 weeks (2+ hours) 
e highly-praised with English 
and blackboard teaching 
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The Learning Problem Course Introduction ' 
Fun Time 
Which of the following description of this course is true? 
© the course will be taught in Taiwanese 


@ the course will tell me the techniques that create the android 
Lieutenant Commander Data in Star Trek 


© the course will be 15 weeks long 
© the course will be story-like 







The Learning Problem Course Introduction 


Fun Time 


Which of the following description of this course is true? 





© the course will be taught in Taiwanese 


@ the course will tell me the techniques that create the android 
Lieutenant Commander Data in Star Trek 


© the course will be 15 weeks long 
© the course will be story-like 











Reference Answer: (4) 
@ no, my Taiwanese is unfortunately not 
good enough for teaching (yet) 


@ no, although what we teach may serve as 
foundations of those (future) techniques 


© no, unless you choose to join the next 
course 


@ yes, let’s begin the story 
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Roadmap 
@ When Can Machines Learn? 


Lecture 1: The Learning Problem 


ə Course Introduction 

ə What is Machine Learning 

ə Applications of Machine Learning 

ə Components of Machine Learning 
ə Machine Learning and Other Fields 








@ Why Can Machines Learn? 
© How Can Machines Learn? 
@ How Can Machines Learn Better? 


The Learning Problem What is Machine Learning 


From Learning to Machine Learning 


learning: acquiring skill 
with experience accumulated from observations 


observations learning skill 


machine learning: acquiring skill 
with experience accumulated/computed from data 


data skill 


What is skill? j 








The Learning Problem What is Machine Learning 


A More Concrete Definition 


skill 
< improve some performance measure (e.g. prediction accuracy) | 


machine learning: improving some performance measure 
with experience computed from data 


improved 
data performance 


measure 





An Application in Computational Finance 


stock data more investment gain 





Why use machine learning? | 
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The Learning Problem What is Machine Learning 


Yet Another Application: Tree Recognition 


e ‘define’ trees and hand-program: difficult 

e learn from data (observations) and 
recognize: a 3-year-old can do so 

e ‘ML-based tree recognition system’ can be 

easier to build than hand-programmed 

system 





ML: an alternative route to 
build complicated systems | 
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The Learning Problem What is Machine Learning 


The Machine Learning Route 


ML: an alternative route to build complicated systems 


Some Use Scenarios 
e when human cannot program the system manually 
—navigating on Mars 
e when human cannot ‘define the solution’ easily 
—speech/visual recognition 
when needing rapid decisions that humans cannot do 
—high-frequency trading 
when needing to be user-oriented in a massive scale 
—consumer-targeted marketing 












Give a computer a fish, you feed it for a day; 
teach it how to fish, you feed it for a lifetime. :-) 
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The Learning Problem What is Machine Learning 


Key Essence of Machine Learning 


machine learning: improving some performance measure 
with experience computed from data 


improved 
data performance 
measure 





@ exists some ‘underlying pattern’ to be learned 
—so ‘performance measure’ can be improved 


@ but no programmable (easy) definition 
—so ‘ML is needed 


© somehow there is data about the pattern 
—so ML has some ‘inputs’ to learn from 





key essence: help decide whether to use ML | 
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The Learning Problem What is Machine Learning 


Fun Time 
Which of the following is best suited for machine learning? 


@ predicting whether the next cry of the baby girl happens at an 
even-numbered minute or not 


© determining whether a given graph contains a cycle 
© deciding whether to approve credit card to some customer 


© guessing whether the earth will be destroyed by the misuse of 
nuclear power in the next ten years 











The Learning Problem What is Machine Learning 


Fun Time 
Which of the following is best suited for machine learning? 





@ predicting whether the next cry of the baby girl happens at an 
even-numbered minute or not 


@ determining whether a given graph contains a cycle 
© deciding whether to approve credit card to some customer 


© guessing whether the earth will be destroyed by the misuse of 
nuclear power in the next ten years 


Reference Answer: © 
@ no pattern 


@ programmable definition 


© pattern: customer behavior; 
definition: not easily programmable; 
data: history of bank operation 


@ arguably no (or not enough) data yet 
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The Learning Problem Applications of Machine Learning 


Daily Needs: Food, Clothing, Housing, Transportation 


data skill 


@ Food (Sadilek et al., 2013) 


e data: Twitter data (words + location) 
e skill: tell food poisoning likeliness of restaurant properly 


@ Clothing (Abu-Mostafa, 2012) 


e data: sales figures + client surveys 
e skill: give good fashion recommendations to clients 


© Housing (Tsanas and Xifara, 2012) 


e data: characteristics of buildings and their energy load 
e skill: predict energy load of other buildings closely 


© Transportation (Sialikamp et al., 2012) 


e data: some traffic sign images and meanings 
e skill: recognize traffic signs accurately 


ML is everywhere! | 
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The Learning Problem Applications of Machine Learning 


Education 


data skill 


e data: students’ records on quizzes on a Math tutoring system 


e skill: predict whether a student can give a correct answer to 
another quiz question 








A Possible ML Solution 





answer correctly ~ [recent strength of student > difficulty of question] 
e give ML 9 million records from 3000 students 


e ML determines (reverse-engineers) strength and difficulty 
automatically 





key part of the world-champion system from 
National Taiwan Univ. in KDDCup 2010 | 
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The Learning Problem Applications of Machine Learning 


Entertainment: Recommender System (1/2) 


data skill 
e data: how many users have rated some movies 
e skill: predict how a user would rate an unrated movie 


A Hot Problem 


e competition held by Netflix in 2006 


e 100,480,507 ratings that 480,189 users gave to 17,770 movies 
e 10% improvement = 1 million dollar prize 


e similar competition (movies —> songs) held by Yahoo! in KDDCup 
2011 


e 252,800,275 ratings that 1,000,990 users gave to 624,961 songs 


How can machines learn our preferences? 
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The Learning Problem Applications of Machine Learning 


Entertainment: Recommender System (2/2) 


A Possible ML Solution 










































Fi e pattern: 
p rating + viewer/movie factors 
aa o z e learning: 
Match movie and sad cantbittons predict known rating 
wt ns — learned factors 
iii x — unknown rating prediction 





key part of the world-champion (again!) 
system from National Taiwan Univ. 
in KDDCup 2011 





The Learning Problem Applications of Machine Learning 


Fun Time 


Which of the following field cannot use machine learning? 
© Finance 

© Medicine 

© Law 

@ none of the above 







The Learning Problem Applications of Machine Learning 


Fun Time 


Which of the following field cannot use machine learning? 
@ Finance 
@ Medicine 
© Law 

© none of the above 








Reference Answer: (4) | 


@ predict stock price from data 
@ predict medicine effect from data 

© summarize legal documents from data 
@ :-) Welcome to study this hot topic! 
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The Learning Problem Components of Machine Learning 


Components of Learning: 
Metaphor Using Credit Approval 


Applicant Information 



































age 23 years 
gender female 
annual salary NTD 1,000,000 
year in residence 1 year 
year in job 0.5 year 
current debt 200,000 








unknown pattern to be learned: 
‘approve credit card good for bank?’ | 


The Learning Problem Components of Machine Learning 


Formalize the Learning Problem 


Basic Notations 

e input: x € ¥ (customer application) 

e output: y € VY (good/bad after approving credit card) 

e unknown pattern to be learned = target function: 
f: X — Y (ideal credit approval formula) 

e data < training examples: D = {(X1, Y1), (X2, Y2), =+- , (XN, Yn) } 
(historical records in bank) 

e hypothesis = skill with hopefully good performance: 
g: X > Y (‘learned’ formula to be used) 





{(Xn, Yn)} from f g 


The Learning Problem Components of Machine Learning 


Learning Flow for Credit Approval 


unknown target function 
f: X> Y 


(ideal credit approval formula) 


| 























training examples picid final hypothesis 
D: (X14, y1), (Xn, YN) m gaf 
























(historical records in bank) (‘learned’ formula to be used) 


e target f unknown 
(i.e. NO programmable definition) 
e hypothesis g hopefully ~ f 
but possibly different from f 
(perfection ‘impossible’ when f unknown) 


What does g look like? 
Heuan-Tien Lin (NTU CSE) 0/07 





The Learning Problem 





training examples 
D: (X1, y1), (XN, Yn) 


(historical records in bank) 











Components of Machine Learning 


The Learning Model 








learning 
algorithm 
A 








final hypothesis 
gat 











(‘learned’ formula to be used) 





hypothesis set 
a 


(set of candidate formula) 


e assume g € H = {hx}, i.e. approving if 
e hy: annual salary > NTD 800,000 
e hy: debt > NTD 100,000 (really?) 
e hg: year in job < 2 (really?) 
e hypothesis set H: 
e can contain good or bad hypotheses 
e up to Ato pick the ‘best’ one as g 





learning model = A and H J 
Hsuan-Tien Lin (NTU CSIE) 
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The Learning Problem Components of Machine Learning 


Practical Definition of Machine Learning 


unknown target function 
f: X> Y 


(ideal credit approval formula) 


| 


training examples 
D: (X1, y1), 5 (XN, Yn) 


(historical records in bank) 


hypothesis set 
H 


(set of candidate formula) 






















learning 
algorithm 
A 


final hypothesis 
graf 




















(‘learned’ formula to be used) 





machine learning: 
use data to compute hypothesis g 
that approximates target f 


The Learning Problem Components of Machine Learning 

Fun Time 
How to use the four sets below to form a learning problem for 
song recommendation? 












Sı = [0,100] 
So = all possible (userid, songid) pairs 
S3 = all formula that ‘multiplies’ user factors & song factors, 


indexed by all possible combinations of such factors 
S4 = 1,000,000 pairs of ((userid, songid), rating) 
O Sı = X, S2 = Y, S3 =H, 54-0 
© Sı = VY, S2 = X, S3 =H, S4 = D 
© Sı = D, S2 = H, S3 = V, S4 = X 
© Sı = ¥, S2 = D, S3 = VY, S4 = H 




















The Learning Problem Components of Machine Learning 

Fun Time 
How to use the four sets below to form a learning problem for 
song recommendation? 










Sı = [0,100] 
So = all possible (userid, songid) pairs 
S3 = all formula that ‘multiplies’ user factors & song factors, 


indexed by all possible combinations of such factors 
S4 = 1,000,000 pairs of ((userid, songid), rating) 
O Sı = X, S2 = Y, S3 =H, 54-0 
@ Sı = VY, S2 = X, S3 = H, S4 = D 
© Sı = D, S2 = H, S3 = V, S4 = X 
= A e = D S= NVS A 




















Reference Answer: © 
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The Learning Problem Machine Learning and Other Fields 


Machine Learning and Data Mining 










Machine Learning DELERILALLAO 


use data to compute hypothesis g | use (huge) data to find property 
that approximates target f that is interesting 









e if ‘interesting property’ same as ‘hypothesis that approximate 
target’ 
—ML = DM (usually what KDDCup does) 
e if ‘interesting property’ related to ‘hypothesis that approximate 
target’ 
—DM can help ML, and vice versa (often, but not always) 
e traditional DM also focuses on efficient computation in large 
database 





difficult to distinguish ML and DM in reality | 
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The Learning Problem Machine Learning and Other Fields 


Machine Learning and Artificial Intelligence 










Machine Learning Artificial Intelligence 


use data to compute hypothesis g | compute something 
that approximates target f | that shows intelligent behavior 


e g =f is something that shows intelligent behavior 
—ML can realize Al, among other routes 

e e.g. chess playing 

e traditional Al: game tree 

e ML for Al: ‘learning from board data’ 





ML is one possible route to realize Al 


The Learning Problem Machine Learning and Other Fields 


Machine Learning and Statistics 











Machine Learning 


use data to compute hypothesis g | use data to make inference 
that approximates target f about an unknown process 


e gis an inference outcome; f is something unknown 
—statistics can be used to achieve ML 

e traditional statistics also focus on provable results with math 

assumptions, and care less about computation 





statistics: many useful tools for ML | 


The Learning Problem Machine Learning and Other Fields 


Fun Time 





Which of the following claim is not totally true? 
© machine learning is a route to realize artificial intelligence 
® machine learning, data mining and statistics all need data 
© data mining is just another name for machine learning 
© statistics can be used for data mining 





Reference Answer: © 


While data mining and machine learning do 
share a huge overlap, they are arguably not 
equivalent because of the difference of focus. 
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| ummary 
@ When Can Machines Learn? 


Lecture 1: The Learning Problem 


ə Course Introduction 
foundation oriented and story-like 
ə What is Machine Learning 
use data to approximate target 
e Applications of Machine Learning 
almost everywhere 
ə Components of Machine Learning 
A takes D and #H to get g 
ə Machine Learning and Other Fields 
related to DM, Al and Stats 





e next: a simple and yet useful learning model (H and A) 
@ Why Can Machines Learn? 
© How Can Machines Learn? 
@ How Can Machines Learn Better? 





